Thompson Sampling for Linear-Quadratic Control Problems

نویسندگان

  • Marc Abeille
  • Alessandro Lazaric
چکیده

We consider the exploration-exploitation tradeoff in linear quadratic (LQ) control problems, where the state dynamics is linear and the cost function is quadratic in states and controls. We analyze the regret of Thompson sampling (TS) (a.k.a. posterior-sampling for reinforcement learning) in the frequentist setting, i.e., when the parameters characterizing the LQ dynamics are fixed. Despite the empirical and theoretical success in a wide range of problems from multi-armed bandit to linear bandit, we show that when studying the frequentist regret TS in control problems, we need to trade-off the frequency of sampling optimistic parameters and the frequency of switches in the control policy. This results in an overall regret of O(T 2/3), which is significantly worse than the regret O( p T ) achieved by the optimism-in-face-of-uncertainty algorithm in LQ control problems.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Haar Matrix Equations for Solving Time-Variant Linear-Quadratic Optimal Control Problems

‎In this paper‎, ‎Haar wavelets are performed for solving continuous time-variant linear-quadratic optimal control problems‎. ‎Firstly‎, ‎using necessary conditions for optimality‎, ‎the problem is changed into a two-boundary value problem (TBVP)‎. ‎Next‎, ‎Haar wavelets are applied for converting the TBVP‎, ‎as a system of differential equations‎, ‎in to a system of matrix algebraic equations‎...

متن کامل

Linear-quadratic optimal sampled-data control problems: Convergence result and Riccati theory

We consider a general linear control system and a general quadratic cost, where the state evolves continuously in time and the control is sampled, i.e., is piecewise constant over a subdivision of the time interval. This is the framework of a linear-quadratic optimal sampleddata control problem. As a first result, we prove that, as the sampling periods tend to zero, the optimal sampled-data con...

متن کامل

AN OPTIMAL FUZZY SLIDING MODE CONTROLLER DESIGN BASED ON PARTICLE SWARM OPTIMIZATION AND USING SCALAR SIGN FUNCTION

This paper addresses the problems caused by an inappropriate selection of sliding surface parameters in fuzzy sliding mode controllers via an optimization approach. In particular, the proposed method employs the parallel distributed compensator scheme to design the state feedback based control law. The controller gains are determined in offline mode via a linear quadratic regular. The particle ...

متن کامل

A NEW APPROACH FOR SOLVING FULLY FUZZY QUADRATIC PROGRAMMING PROBLEMS

Quadratic programming (QP) is an optimization problem wherein one minimizes (or maximizes) a quadratic function of a finite number of decision variable subject to a finite number of linear inequality and/ or equality constraints. In this paper, a quadratic programming problem (FFQP) is considered in which all cost coefficients, constraints coefficients, and right hand side are characterized by ...

متن کامل

Linear Thompson Sampling Revisited

We derive an alternative proof for the regret of Thompson sampling (TS) in the stochastic linear bandit setting. While we obtain a regret bound of order e O(d3/2 p T ) as in previous results, the proof sheds new light on the functioning of the TS. We leverage on the structure of the problem to show how the regret is related to the sensitivity (i.e., the gradient) of the objective function and h...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017